Skip to content

Conversation

@songkant-aws
Copy link
Contributor

@songkant-aws songkant-aws commented Nov 6, 2025

Description

Add sort performance enhancement of pushing down sort by complex expressions to scan.

Related Issues

Resolves #3912

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@yuancu yuancu added enhancement New feature or request backport 2.19-dev labels Nov 7, 2025
project.getInput().getTraitSet().getTrait(RelCollationTraitDef.INSTANCE);

// Branch 1: Check if complex expressions are already sorted by scan
if (handleComplexExpressionsSortedByScan(call, project, toTraits, toCollation)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some IT and ExplainIT for this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of sort expression pushdown ITs are for this case.

Copy link
Member

@LantaoJin LantaoJin Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: which one for example?
and can this predicate move to definition of ExpandCollationOnProjectExprRule.Config (can reduce node copy)

Signed-off-by: Songkan Tang <[email protected]>
LantaoJin
LantaoJin previously approved these changes Nov 12, 2025
b2.operand(CalciteLogicalIndexScan.class)
.predicate(
Predicate.not(
CalciteLogicalIndexScan
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems duplicated with the below predication of noAggregatePushed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use noAggregatePushed now


// Check if the scan has sort expressions pushed down
if (scan.getPushDownContext().stream()
.noneMatch(operation -> operation.type() == PushDownType.SORT_EXPR)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that the pushed SORT(i.e. SORT_EXPR with all digests are simple expression) can satisfy sort collation here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible. The cost computation has coarse results by multiplying field count with 1.1. So VolcanoPlanner will prefer regular field sort pushdown.

Another option is to calculate the accurate complex expression count and multiply this count with 1.1. Put a check in the rule to not pushdown if all of expressions are simple expressions

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, in this query ... | sort a | eval b = a + 1 | sort b, when pushing the sort b, this method will always return false. But I expect it return true.

Copy link
Contributor Author

@songkant-aws songkant-aws Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it returns false in this case but doesn't matter. Now, if all of expressions are simple, it will go through regular pushDownSort. Field sort exists in regular traitset and is propagated well. It will be handled by handleSimpleExpressionFieldSorts method in ExpandCollationOnProjectExprRule.

Integer offsetValue = LimitIndexScanRule.extractOffsetValue(sort.offset);
newScan = (CalciteLogicalIndexScan) scan.pushDownLimit(sort, limitValue, offsetValue);
} else {
// Attempt to push down sort expressions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still push down sort although scanProvidesRequiredCollation is true? I think we could just remove the sort directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking if first sort by [a, b] expressions is already in scan. The second sort by [a] expression could narrow down the sort effort. We could still allow the pushdown.

For topK pushdown, it could be also applied. If sort by [a, b], limit 10 is pushed, say the second sort by [a], limit 2 comes in, we should allow it pushdown as well. This may need replace existing SORT_EXPR and LIMIT pushdown context.

But right now, it has the problem that multiple different sort complex expressions doesn't work well yet.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second sort by [a] expression could narrow down the sort effort. We could still allow the pushdown.

Shall we do the same thing for the above branch then? It only push down limit when scanProvidesRequiredCollation is true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revisit the logic, for already pushed down topK and if the scan collation satisfies the output collation, we can still leave the old sort in the scan(remove the new sort in the meantime). It's kind of keeping semantics. For regular sort pushdown, we can override by the new sort.

throw new UnsupportedScriptException(
"ScriptQueryExpression requires a valid current time from hook, but it is not set");
}
Map<String, Object> mergedParams = new HashMap<>(params);
Copy link
Collaborator

@qianheng-aws qianheng-aws Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better use LinkedHashMap here or the final plan may be unstable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use LinkedHashMap now

OSRequestBuilderAction action =
requestBuilder -> {
for (SortExprDigest info : sortExprDigests) {
requestBuilder.pushDownSortExpression(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question] Will this method throw exception by any chance? We plan to make the request builder transformation process lazy. We should put any logic which may throw exception outside of the lambda function, so it can avoid failure on the final plan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, move most of logic that could throw exception outside. And use a lazy pushdownSortSuppliers method instead.

sortExprDigest.getDirection().name()));

Script script = scriptExpr.getScript();
if (script != null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if script is null? Seems nothing happens. We better add assertion or throw exception if that's never possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should never happen. I neglect this check.

case SORT_EXPR -> {
@SuppressWarnings("unchecked")
List<SortExprDigest> sortKeys = (List<SortExprDigest>) operation.digest();
dCpu += NumberUtil.multiply(dRows, 1.1 * sortKeys.size());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should filter the simple expr.

Copy link
Contributor Author

@songkant-aws songkant-aws Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, it will only count complex expr. The calculation is fine grained.

Signed-off-by: Songkan Tang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.19-dev enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Calcite Engine Framework: pushdown sort with non-reference expression

4 participants